AITopics | inverse reinforcement learning

A.1 Additional lemma Lemma 9 Let s0 be the starting state, let (a)n represent a sequence of actions and let M = Z(ar)Z(ar 1)...Z(a1) i.e., the product of matrices in {Z(a)}left multiplied in order of the sequence Proof Here we use proof by induction. We note that the interchange of the integral and infinite summation is justified by Section 3.7 in [5], since the coefficients Z We can then conclude the statement of the lemma by induction. A.2 Proof of Proposition 1 Proof By Lemma 9, given a fixed sequence of actions (a)n, the r-th state sr under this sequence of actions starting from state s0 has a distribution that can be represented over the basis {φn(s)}. Therefore, the expected reward under any sequence of actions for reward Ris the same as for the projected reward R0 for any state sr where r > 0. The reward at the starting state, R(s0) does not depend on the policy. Therefore, the value of R(s0) does not change whether a policy is optimal or not.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

210f760a89db30aa72ca258a3483cc7f-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 01:51:52 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Identifiability and Generalizability from Multiple Experts in Inverse Reinforcement Learning

Neural Information Processing SystemsApr-24-2026, 08:34:57 GMT

While Reinforcement Learning (RL) aims to train an agent from a reward function in a given environment, Inverse Reinforcement Learning (IRL) seeks to recover the reward function from observing an expert's behavior. It is well known that, in general, various reward functions can lead to the same optimal policy, and hence, IRL is ill-defined. However, [1] showed that, if we observe two or more experts with different discount factors or acting in different environments, the reward function can under certain conditions be identified up to a constant. This work starts by showing an equivalent identifiability statement from multiple experts in tabular MDPs based on a rank condition, which is easily verifiable and is shown to be also necessary. We then extend our result to various different scenarios, i.e., we characterize reward identifiability in the case where the reward function can be represented as a linear combination of given features, making it more interpretable, or when we have access to approximate transition matrices. Even when the reward is not identifiable, we provide conditions characterizing when data on multiple experts in a given environment allows to generalize and train an optimal agent in a new environment. Our theoretical results on reward identifiability and generalizability are validated in various numerical experiments.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

Europe (0.94)
North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

cc7e2b878868cbae992d1fb743995d8f-Paper.pdf

Neural Information Processing SystemsApr-22-2026, 05:28:44 GMT

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Cooperative Inverse Reinforcement Learning

Dylan Hadfield-Menell, Stuart J. Russell, Pieter Abbeel, Anca Dragan

Neural Information Processing SystemsApr-22-2026, 01:53:37 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Showing versus doing: Teaching by demonstration

Mark K. Ho, Michael Littman, James MacGlashan, Fiery Cushman, Joe Austerweil, Joseph L. Austerweil

Neural Information Processing SystemsApr-21-2026, 22:09:51 GMT

People often learn from others' demonstrations, and inverse reinforcement learning (IRL) techniques have realized this capacity in machines. In contrast, teaching by demonstration has been less well studied computationally. Here, we develop a Bayesian model for teaching by demonstration. Stark differences arise when demonstrators are intentionally teaching (i.e.

Add feedback

Generative Adversarial Imitation Learning

Neural Information Processing SystemsMar-17-2026, 11:07:20 GMT

Consider learning a policy from example expert behavior, without interaction with the expert or access to a reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

adversarial imitation learning jonathan ho, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

inverse reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

da409884a933ecbc4af03338111bf6aa-Paper-Conference.pdf

56c51a39a7c77d8084838cc920585bd0-Paper.pdf

4e55139e019a58e0084f194f758ffdea-Paper.pdf

Supplementary material: Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees AProofs of lemmas and theorems

210f760a89db30aa72ca258a3483cc7f-Paper.pdf

Identifiability and Generalizability from Multiple Experts in Inverse Reinforcement Learning

cc7e2b878868cbae992d1fb743995d8f-Paper.pdf

Cooperative Inverse Reinforcement Learning

Showing versus doing: Teaching by demonstration

Generative Adversarial Imitation Learning